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The genes POU5F1 and SOX2 are critical for pluripotency and reprogramming, yet the chromosomal 
organization around these genes remains poorly understood. We assayed long-range chromosomal 
interactions on putative enhancers of POU5F1 and SOX2 genes in human embryonic stem cells (hESCs) 
using 4C-Seq technique. We discovered that their frequent interacting regions mainly overlap with early 
DNA replication domains. The interactomes are associated with active histone marks and enriched with 
5-hydroxymethylcytosine sites. In hESCs, genes within the interactomes have elevated expression. 
Additionally, some genes associated with the POU5F1 enhancer contribute to pluripotency. Binding sites 
for multiple DNA binding proteins, including ATF3, CTCF, GABPA, JUND, NANOG, RAD21 and YY1, 
are enriched in both interactomes. The RARG locus, frequently interacting with the POU5F1 locus, has 
abundant RAD21 binding sites co-localized with other protein binding sites. Thus the interactomes of these 
two pluripotency genes could be an important part of the regulatory network in hESCs. 



Embryonic stem (ES) cells are pluripotent stem cells characterized by unlimited self-renewal capacity and the 
ability to differentiate into all three germ layers. These cells are considered promising in cell replacement 
therapy. The development of induced pluripotent stem (iPS) cell technology, in which overexpression of 
four transcription factors (OCT4/POU5F1, SOX2, KLF4, and MYC) reverts somatic cells such as skin fibroblasts 
to the pluripotent state and generates ES-like iPS cells, further propelled the field of regenerative medicine and 
stem cell therapy 1 . These reprogramming factors eventually activate the expression of a class of pluripotency 
genes. Extensive efforts have been made to understand the epigenetic regulation of genes related to stem cell self- 
renewal, pluripotency and reprogramming. Currently, DNA methylation 2 , histone modification 3 " 7 and transcrip- 
tion factor binding 811 are recognized as critical factors in chromatin structure modulation and gene regu- 
lation 1213 , and pluripotent stem cells (PSCs) exhibit distinct DNA methylation patterns and histone 
modifications compared to differentiated cells. In addition, a characteristic nuclear architecture has been estab- 
lished as a unique epigenetic feature of PSCs 14 ; thus long-range chromatin interactions have been postulated to 
play critical roles in maintaining pluripotent cell function. However, detailed chromatin-interaction maps 
remain to be characterized to understand the functional importance of the chromatin interactomes, such as 
the crosstalk between enhancer elements and distally targeted genes. 

In the last several years, chromosome conformation capture (3C) has emerged as a high-throughput molecular 
biology approach to explore long-range chromatin interactions. Newer techniques such as Hi-C and 5C detect 
global chromatin interactions 1516 . In combination with next generation sequencing, Hi-C unveiled interesting 
chromosomal organization such as open and closed chromatin compartments in the nucleus. A recent study 
revealed topological domains of the mammalian genomes from 3C based high-resolution chromatin interaction 
maps 17 . Other methods like ChlA-PET uncovered interactions mediated by particular DNA binding proteins 18 . 
All of these methods provide novel insights into global higher-order chromosomal structure; however, they only 
detect limited interacting partners for a specific regulatory element. Circular chromosome conformation capture 
(4C) with restriction enzyme fragmentation and proximity-based ligation was designed to detect all unknown 
sites interacting with a known "bait" sequence 19 " 22 . In addition, a sonication-based 4C library preparation method 
was developed to capture strong chromatin interactions of the bait sequence 18 ' 23 . Together with next-generation 
sequencing, this technology enables genome-wide identification of strong interacting partners of a regulatory 
element such as an enhancer. 
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During early embryogenesis, enhancer elements marked with dif- 
ferent chromatin signatures either activate or suppress transcription 
of nearby genes 6 . In hESCs, upstream putative enhancers of the 
POU5F1 and SOX2 genes are associated with p300 and H3K27ac 7 . 
The upstream distal enhancer of the POU5F1 gene is a well-known 
regulator of OCT4 expression in PSCs via looping 13 ' 24 . It has been 
reported that enhancer elements can regulate genes several hundred- 
thousand base pairs away 25 ' 26 ; thus, we hypothesize that POU5F1 and 
SOX2 enhancers may play long-range regulatory roles in addition to 
activating nearby genes. To test this hypothesis, we utilized a high- 
throughput assay to detect POU5F1 and SOX2 enhancer-associated 
long-range interactions. With extensive data analysis, we aim to 
understand the roles of these long-range interactions in the regula- 
tory transcriptional circuitry that governs stem cell self-renewal and 
pluripotency. 

Results 

Identification of POU5F1 and SOX2 enhancer interactomes. The 

putative POU5F1 and SOX2 enhancer regions are evolutionary 
conserved across vertebrates (44 species), with active enhancer signa- 
tures defined by epigenetic marks such as p 3 00 histone acetyltrans- 
ferase and H3K27ac but not H3K27me3 in hESCs 7 (Supplementary 
Fig. 1A). We applied the 4C technique to identify the interacting 
partners of the "bait" putative enhancers of POU5F1 and SOX2 in the 
pluripotent H9 cell line. Briefly, the cells were fixed to crosslink 
chromosomes within proximity. The chromosomes were then frag- 
mented by sonication. Interacting chromatin fragments were ligated 
and the DNA pieces were purified. Genomic regions interacting with 
the "bait" were then amplified by nested PCR (Supplementary Fig. 
IB, Supplementary Table 1). We designed inverse PCR primers to 
target putative enhancers of the POU5F1 and SOX2 genes. The 
constructed 4C library could be visualized by DNA electrophoresis, 
whereas the control without ligation showed almost no PCR 
products (Supplementary Fig. IB), indicating that the 4C library 
was amplified from ligation products. 

We performed next-generation DNA sequencing using an 
Illumina HiSeq sequencer, and classified the identified 4C interac- 
tions as either proximal or distal interactions (see Methods). Distal 
interactions include inter- chromosomal interactions and intra- 
chromosomal interactions over genomic distances greater than 
20 kb, whereas proximal interactions cover intra-chromosomal regions 
with genomic distances between 300 bp and 20 kb. Consistent with 
the results of 3C-based studies, our distal interaction reads account 
for only a small portion of the total interactions, roughly 10% ~ 20% 
for the 4C libraries (Supplementary Table 2). We used two different 
batches of H9 cells to construct the 4C libraries for POU5F1 and 
SOX2 independently for next- generation sequencing. The two repli- 
cates of POU5F1 and SOX2 distal interactions overlap by 35% and 
25%, respectively. This is consistent with the moderate overlap 
observed in biological replicates of CTCF-mediated chromatin inter- 
actomes in mouse ES cells 27 , and may reflect the diversity and 
dynamics of enhancer interactions, ligation efficiency, complexity 
of PCR amplification as well as sequencing depth. To filter out 
interactions that likely resulted from stochastic effects, we used a 
false discovery rate (FDR) -based statistical model to identify the 
enriched interacting domains across the whole genome. We further 
merged the enriched interacting domains that overlap between the 
biological replicates as being high- confidence frequent interacting 
domains. In total, 23 and 9 high-confidence frequent interacting 
domains were identified for the POU5F1 and SOX2 interactomes, 
respectively (Figure 1, Supplementary Table 3, 4), and most of them 
are inter- chromosomal interactions that involve gene-rich regions, 
similar to the e4C results 20 . 

The interactomes of POU5F1 and SOX2 enhancers overlap with 
early DNA replication domains but not with heterochromatic 




Figure 1 | Circos maps of the distal interactions are presented using 
Circos software 52 . The blue outer ring represents the gene density profile. 

regions. We next examined whether the POU5F1 and SOX2 en- 
hancer interacting domains (size ranging from 1 Mb to 4 Mb, 
Supplementary Table 3, 4) have any unique epigenetic features. As 
Ryba et al. showed 28 , DNA replication timing correlates with the 
spatial proximity of chromatin as measured by Hi-C analysis, sug- 
gesting that early and late DNA replication occur in spatially distinct 
compartments in the nucleus. We noticed that the POU5F1 and 
SOX2 gene loci are within early DNA replication domains in 
hESCs, and wondered whether their interacting domains have a si- 
milar DNA replication timing. As shown in Figure 2 A, the POU5F1 
and SOX2 enhancer interacting domains mainly overlap with early 
DNA replication domains in hESCs. The distribution of assayed 
replication timing values within the genomic regions surrounding 
the identified POU5F1 and SOX2 enhancer-interacting sites (50 kb 
range) shows a shift toward positive values compared to genome- 
wide array values (P < 2.2 X 10" 16 for both, by unpaired Wilcoxon 
rank-sum test; Figure 2B). This observation revealed an interesting 
epigenetic feature, that higher- order structures surrounding the 
POU5F1 and SOX2 enhancers have synchronized DNA replication 
timing. As early DNA replication domains have been connected to 
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A Distance Distribution of the Interactomes to the closest TSS 
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Figure 2 | Relationship of the interactomes with DNA replication 
domains and heterochromatic regions. Interacting sites within the 
frequent interacting domains are presented in (2A) (only Chrl and Chr2 
are shown). (2B), Comparison of distribution of DNA replication timing 
(RT) array values for the regions within ±50 kb of the interacting sites to 
the whole genome. 

active gene transcription in hESCs 28 , it is likely that the interactions 
of POU5F1 and SOX2 enhancers with the identified distal regions 
have functional significance. This observation is consistent with what 
we have seen in POU5F1 enhancer interactome in mouse ES cells 
(data not shown). Also revealed in Figure 2 A, POU5F1 and SOX2 
interacting domains do not overlap with heterochromatic regions 
labeled with strong H3K9me3 signals in hESCs 29 , suggesting that 
the interactomes are within an active portion of the genome in 
terms of gene regulation. In addition, we explored a potential 
relationship with the nuclear lamina-associated domains (LADs) 
of human chromosomes 30 , but did not observe any connection, 
possibly due to the fact that the LADs were determined in human 
fibroblasts. 

The enhancer interactomes are adjacent to transcription start sites 
and possess active epigenetic features. To explore the correlation of 
POU5F1 and SOX2 enhancer interactomes with annotated gene 
locations, we analyzed the distribution of the genomic distance of 
the enhancer interacting sites to the closest gene transcription start 
site (TSS) (Figure 3A). The kernel density plots of both POU5F1 and 
SOX2 enhancer-interacting sites clearly show a sharp peak centered 
at TSS. When compared to the distribution peak of randomly 
simulated sites in the whole genome, the peaks observed in the en- 
hancer interactomes are significantly higher within a narrow window 
around TSS (P = 0.0018, P = 0.0047, respectively, Kolmogorov- 
Smirnov test). Enrichment of the interacting sites around TSS 
suggests that POU5F1 and SOX2 enhancers prefer interacting with 
genomic regions containing genes. We further investigated the 
distribution of POU5F1 and SOX2 enhancer-interacting sites at 
different genomic regions 31 . Shown in Figure 3B, gene promoter 
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Figure 3 | (A) Kernel density estimation of genomic distance from the 
interacting sites to the nearest TSS sites. Distances were calculated using 
HOMER software 34 . Density maps of 100,000 random sites in the whole 
genome are also included. (B) Enrichment analysis of the interactomes at 
different genie regions. Distribution analysis was performed using CEAS 
software 31 . 

sequences are one of the enriched regions for both the POU5F1 
and SOX2 enhancer interactomes. Additionally, the fraction of 
distal intergenic regions is consistently reduced for the two 
interactomes. Therefore, our observation suggests that the higher- 
order chromosome structure surrounding the active enhancers may 
be directly involved in gene regulation. Also noted, when random 
sites in the early DNA replication areas (see Methods for details) are 
selected to compare distance distribution to TSS, the POU5F1 and 
SOX2 interactomes are still more enriched around TSS, although 
statistically not significant (P = 0.390,1 P = 0.2098, respectively, 
Kolmogorov-Smirnov test, Supplementary Fig. 2). 

Histone modification is known to be involved in transcriptional 
regulation by decondensing compact chromatin structures for tran- 
scription factor binding. 3C-Based genome-wide interactome stud- 
ies have identified active and inactive chromatin compartments in 
the nucleus, with active compartments enriched with histone marks 
that activate gene transcription, such as H3K9ac 32 . We therefore 
asked whether active enhancers of POU5F1 and SOX2 selectively 
interact with distal regions showing active gene transcription. 
ChlP-Seq tag profiles of H3K27me3, H3K4mel, H3K4me2, 
H3K27ac, H3K9ac, H4K20mel, and H3K36me3 in HI ES cell line 33 
were used for analyzing histone patterns associated with the inter- 
actomes. ChlP-Seq tags around the enhancer interacting sites were 
counted and normalized 34 , and visualized as boxplot. As noted, 
POU5F1 and SOX2 interactomes have higher intensity profiles of 
H3K4mel, H3K4me2 and H4K20mel, which are marks for gene 
active regulation (Figure 4A). Compared to the random sites in early 
DNA replication areas, the investigated histone marks are in general 
more enriched in the POU5F1 interactome than in the SOX2 inter- 
actome, with H3K4mel and H3K9ac are the most significantly 
enriched ones in the POU5F1 interactome (P < 2.2 X 10" 16 for both 
marks, unpaired Wilcoxon rank- sum test). Interestingly, Polycomb- 
mediated gene silencing mark H3K36me3 35 is not enriched in either 
interactome (P = 0.485, P = 0.922, respectively). We also investi- 
gated the relationship of POU5F1 and SOX2 enhancer interactomes 
with 5-hydroxymethylcytosine (5-hmC) sites in the genome. As a 
previous study revealed, genomic 5-hmC sites are enriched at ac- 
tive enhancers in hESCs 36 . Because POU5F1 and SOX2 upstream 
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Figure 4 | (A) Boxplot of different histone marks around the interacting sites and random sites. ChlP-Seq tags within ± 5 kb from an interacting site were 
counted and normalized. (B) Distance distribution of the interacting sites and random sites to the closest 5-hmC site were compared. 100,000 random 
sites were generated from early DNA replication areas for comparison. 



enhancers are adjacent to 5-hmC sites (within 5 kb), we asked 
whether the enhancer interactomes are also enriched with 5-hmC 
sites. Shown in Figure 4B, 5-hmC sites are enriched in the proximity 
of POU5F1 and SOX2 interactomes compared to the random sites in 
early DNA replication areas (P < 2.2 X 10" 16 , P = 0.047, respect- 
ively, Kolmogorov-Smirnov test). Thus, the two enhancer interac- 
tomes in hESCs possess epigenetic features of active enhancers, 
which may facilitate active regulation of gene transcription. 

Transcriptional status of POU5F1 and SOX2 enhancer interacting 
genes. To understand the transcriptional status of genes associated 
with POU5F1 and SOX2 enhancers, we analyzed RNA-Seq 
transcriptome data in hESCs and fetal lung fibroblasts 37 , focusing 
on genes that have a TSS within 10 kb of the enhancer-interacting 
sites. In hESCs, expression of POU5F1 and SOX2 enhancer- 
interacting genes is significantly higher than that of all genes in 



hESCs (P = 7.5 X 10" 6 and P = 1.2 X 10" 6 , respectively, by 
unpaired Wilcoxon rank- sum test; Figure 5A), indicating that the 
interactomes are enriched with actively transcribed genes. Thus, 
active POU5F1 and SOX2 enhancers could be involved in 
mediating transcription of the associated distal genes. RNA-Seq 
transcriptome analysis also revealed that the genes originally 
associated with active POU5F1 and SOX2 enhancers in hESCs are 
down-regulated in differentiated lung fibroblasts (P = 1.2 X 10~ 5 
and P = 0.013, respectively, by paired Wilcoxon rank- sum test; 
Figure 5B). These results indicate that the enhancers of these two 
sternness-related genes interact with a group of genes that is highly 
expressed in hESCs but repressed in fibroblasts. 

Distal genes associated with the POU5F1 enhancer are involved in 
the regulatory circuitry of pluripotency. To explore whether 
POU5F1 and SOX2 enhancer-interacting genes contribute to 
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Figure 5 | (A) Comparison of RPKM values for the interacting genes (genes with a TSS < 10 kb from the interacting sites) with all human Ensembl genes 
(average RPKM values from replicate RNA-Seq data in ENCODE were used). (B) Differential expression was analyzed to compare interactome genes in 
hESCs and fetal lung fibroblasts. 



self-renewal and pluripotency of hESCs, we analyzed a data from a 
genome- wide RNA interference screen using a POU5F1 promoter 
reporter assay in hESCs 38 . Interference of transgenic POU5F1 -GFP 
expression is quantified with Fav values reflecting GFP fluorescence 
intensity, which represents the propensity to be sternness -related 
(Figure 6A). In comparison to all the genes screened, POU5F1 
enhancer-interacting genes (genes closest to the interacting sites, 
with genomic distance from TSS to the interacting sites less than 
10 kb) show a trend of decreasing Fav values, although statistically 
not significant (P = 0.08, unpaired Wilcoxon rank-sum test). How- 
ever, for SOX2-targeting genes, the Fav values are similar to all the 
genes screened (P = 0.98, unpaired Wilcoxon rank- sum test). There- 
fore, based on these data, genes in the POU5F1 interactome appear to 
be more important for stem cell pluripotency than genes in the SOX2 
interactome. 

Gene ontology analysis 39 of 39 POU5F1 -interacting genes and 20 
SOX2-interacting genes revealed that a significant portion of them to 
be involved in transcriptional regulation (30.8% and 35.0% for the 
POU5F1 and SOX2 interactomes, respectively; Supplementary Table 
5, 6), and distributed to 8 and 4 different interacting domains in the 
POU5F1 and SOX2 interactomes, respectively, with no more than 3 
genes in one interacting domain. Differential expression analysis of 
these transcriptional regulators in hESCs and fetal lung fibroblasts 
revealed 9 of those in the POU5F1 interactome (11 of 12 identified 
transcriptional regulators have RNA-Seq data) to have higher 
expression in hESCs (Figure 6B), suggesting active roles for these 
genes in the pluripotency regulatory network in hESCs. For those 
transcriptional regulators in the SOX2 interactome, 3 of 5 showed 
increased expression in hESCs. Therefore, the POU5F1 enhancer 



interactome may play a greater role in stem cell pluripotency than 
the SOX2 enhancer interactome. 

Several transcription factor binding sites are enriched in POU5F1 
and SOX2 enhancer interactomes. Transcription factor binding is 
one of the characteristics that enhancers possess, and it may facilitate 
long-range chromatin-chromatin interactions of the enhancer with 
distal genes. POU5F1 and SOX2 putative enhancers are known hot 
spots for association of multiple DNA binding proteins in hESCs. To 
understand whether certain transcription factors are enriched in 
POU5F1 and SOX2 enhancer-interacting regions, we systemati- 
cally analyzed ChlP-Seq profiles of 32 DNA binding proteins in 
hESCs (see Methods). Normalized ChlP-Seq tag counts around the 
center of the 4C interacting sites were calculated, and visualized using 
boxplot (Figure 7 A). As a control, the counts around the center of the 
random sites in the early DNA replication areas were also calculated. 
Statistical analysis identified ATF3, CTCF, GABPA, JUND, 
NANOG, RAD21 and YY1 sites were the most enriched proteins 
in both enhancer interactomes compared to the control (P < 0.01, 
unpaired Wilcox rank-sum test). However, pluripotency-related 
transcription factor OCT4 is not enriched in either POU5F1 or 
SOX2 interactome. Therefore, ATF3, CTCF, GABPA, JUND, NAN- 
OG, RAD21 and YY1 are the characteristic proteins in POU5F1 and 
SOX2 enhancer interactomes in hESCs, and their presence may be 
functionally important. 

RAD21 is potentially in complex with other transcription factors 
to mediate the enhancer interactomes. As shown in Figure 7A, 

RAD21 is one of the enriched proteins identified in POU5F1 and 
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Figure 6 | (A) Correlation of the interactome genes with pluripotency. The interacting genes screened in a siRNA assay using POU5F1-GFP reporter gene 
in hESCs were included for analysis. Distribution of Fav values (change of fluorescence) for the interacting genes are compared to all 21 122 genes screened 
in the original assay. (B) Analysis of differential expression of identified transcriptional regulators in hESCs and fetal lung fibroblasts. 



SOX2 enhancer interactomes in hESCs. RAD21 is part of a cohesion 
complex known as a DNA chain crosslinker. Cohesin has been 
shown to mediate looping of the Pou5fl distal enhancer with the 
Pou5fl gene promoter in mouse ES cells 40 . Consistent with a pre- 
vious report that Rad21 cooperates with Ctcf and other transcription 
factors in maintaining mouse ES cell identity 40 , RAD21 sites in both 
POU5F1 and SOX2 interactomes in hESCs are co- occupied by 
transcription factors. Aggregation plots clearly illustrate the peak 
signals of ATF3, CTCF, GABPA, JUND, NANOG and YY1 
binding profiles at the center of RAD21 sites in POU5F1 and SOX2 
interactomes (Figure 7B). Knockdown of Gabpa in mouse ES cells is 
known to reduce Oct3/4 expression, whereas overexpression of 
Gabpa maintains expression of Oct3/4 even without LIF in mouse 
ES cell culture 41 . In a genome- wide siRNA screen in hESCs 38 , knock- 
down of most of the enriched proteins, such as RAD21, CTCF, 
GABPA, JUND, NANOG and YY1, significantly reduced the expres- 
sion of a transgenic POU5F1 -promoter linked to a GFP reporter, 
with Fav values of -1.50421, -1.05828, -1.44161, -1.07731, 
-1.82749, -2.57204, respectively (within the top 25% of all of the 
genes screened). To further understand the chromatin- chromatin 
interactions, we applied DNA fluorescent in-situ hybridization 
(FISH) to visualize co -localization of two genomic loci at the single 
cell level. The RARG gene locus on chromosome 12 has been iden- 
tified in the POU5F1 interactome in hESCs. RARG was recently 
shown to play a very important role in pluripotency and can 
enhance iPS cell induction by two magnitudes 42 . Another locus on 
chromosome 12 that is not part of the POU5F1 interactome was used 
as a control. The co -localization frequency between the RARG 
locus (chrl2: 51751589-51956291) and the POU5F1 locus 
(chr6: 31169844-31340561) was found to be 1.67-fold higher than 
the frequency between the control locus (chrl2: 80380971- 
80564119) and the POU5F1 locus (9.35% versus 5.61%, P < 0.05, 



Supplementary Fig. 3A). The higher contact frequency between the 
RARG and POU5F1 loci is consistent with our sequencing data, and 
suggests that more stable interactions are formed between the two 
loci. Examination of the RARG and control loci (Supplementary Fig. 
3B) revealed that there are more RAD21 and other transcription fac- 
tor binding sites in the RARG locus with frequent co-localization. 
Coincidence of chromosome interaction frequency with RAD21- 
containing binding sites for multiple transcription factors suggests 
that RAD21 may cooperate with other transcription factors to orga- 
nize long-range chromatin -chromatin interactions. Thus, RAD21, 
together with multiple transcription factors, is likely to contribute 
to higher- order chromosomal structure surrounding POU5F1 and 
SOX2 enhancers in hESCs as part of a pluripotency network. 
However, additional molecular studies are needed to confirm this 
hypothesis derived from 4C-Seq study. 

Discussion 

To our knowledge, interactome maps of the POU5F1 and SOX2 
enhancers in hESCs are presented for the first time, obtained by 
combining 4C with next-generation sequencing. The interactome 
data show that the putative enhancers of POU5F1 and SOX2 can 
physically associate with distal active genomic regions, which could 
be an important part of the regulatory network in hESCs. 

Previous studies revealed that putative enhancer elements are 
marked with H3K27ac histone modification 6 and binding of p300 
histone acetyltransferase 7 . The binding sites of p300 have accurately 
identified novel enhancers with tissue -specific activities in transgenic 
mouse assays 43 . Histone modifications at the enhancers are cell type- 
specific, and regulate genome-wide gene expression 7 . Classical 
enhancers are located immediately upstream of a promoter for regu- 
lating transcription, but more evidence indicates that enhancer 
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elements distal to the target gene can loop with the target gene 44 . A 
recent study also revealed extensive promoter-promoter interac- 
tions, both intra-chromosomally and inter-chromosomally 45 . For 
POU5F1 and SOX2 putative enhancers marked with p300 in 
hESCs, we observed their preferential interactions with distal gene- 
rich regions that are associated with enhancer histone marks, sug- 
gesting that the regulatory effects of the two enhancers are not 
restricted to their nearby POU5F1 and SOX2 genes, and may involve 
long-range contact with other distal genes. Consistently, the 
interactomes are enriched with 5-hmC sites that are associated with 
active enhancers in hESCs 36 ' 46 . Our observations of enhancer 



interactomes support genome- wide long-range interactions between 
the enhancers and the targeted regions. These long-range interac- 
tions are likely to be dynamic and transient. For example, regarding 
the RARG locus on chromosome 12 that interacts with the POU5F1 
locus on chromosome 6, only around 10% of total cells exhibit a 
specific interaction between the two loci. 

Chromatin-chromatin interactions construct a higher- order 
chromosomal structure that is part of a pluripotency network in 
ES cells. The Nanog gene locus in mouse ES cells is enriched with 
DNasel hypersensitivity sites, facilitating pluripotency- related tran- 
scription factor binding which serves to organize a higher- order 
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chromosomal structure that may have functional roles 47 . DNA rep- 
lication domains are also structured in the nucleus, with early and 
late DNA replication domains correlating nicely with different com- 
partments of higher-order chromosomal structure in ES cells 28 . For 
POU5F1 and SOX2 enhancer mediated higher- order chromosomal 
structure, we observed strong association with early DNA replication 
domains, active enhancer epigenetic marks, and activated transcrip- 
tion of interacting genes. Furthermore, a portion of the genes tar- 
geted by the POU5F1 enhancers is pluripotency- related and involved 
in transcriptional regulation in hESCs. 

Our results also suggest that several DNA binding proteins might 
mediate the POU5F1 and SOX2 enhancer interactomes. We observed 
abundant inter- DNA chain linker cohesion protein RAD21 and 
looping protein CTCF together with pluripotency- related transcrip- 
tion factors such as GABPA, JUND, NANOG and YY1. RAD21 
mediates DNA looping, particularly between enhancer and promoter 
regions in ES cells 40 . CTCF, one of the most studied insulator pro- 
teins, also mediates long-range interactions that are functionally 
important to pluripotent cells 27 . For the enhancer interactomes in 
pluripotent cells, RAD21 may complex with CTCF and other 
transcription factors, thus potentially bridging distal chromatins and 
mediating interactions between regions like enhancers and promoters. 
Co -localized transcription factors such as YY1 may facilitate ordered 
recruitment of chromatin modifying enzymes to regulate higher- order 
chromosomal structure 48,49 . Thus, pluripotency-related transcription 
factors may coordinate with structural proteins RAD 21 and CTCF to 
mediate POU5F1 and SOX2 enhancer interactomes in hESCs. 

Collectively, we characterized POU5F1 and SOX2 putative enhan- 
cer interactomes in hESCs to understand long-range enhancer inter- 
actions with the targeted regions. However, the data from the next- 
generation sequencing are merely average observations across a large 
group of cells, therefore, the identified interacting regions should not 
be viewed as interacting with POU5F1 or SOX2 enhancer simulta- 
neously. The low concordance between biological replicate data from 
sonication based 4C approach is similar to the results from ChlA- 
PET. Also consistent with other sonication based 4C studies reported 
in Yijun Ruan group, the identified sites are either in close proximity 
to the bait or distributed across the whole genome, which is different 
from enzyme-based 4C approach. Most likely, the sonication based 
4C approach only captures strong chromatin-chromatin interac- 
tions. Distal chromosomal interactions are infrequent compared to 
proximal interactions in the nucleus, thus many of the detected distal 
sites associated with the "bait" might not be biologically meaningful. 
Therefore, statistical analysis is necessary to filter out random noise 
and identify enriched interacting regions. It is expected that with 
increasing sequencing depth, higher concordance between biological 
replicates can be achieved. 

Methods 

Cell culture and chromatin preparation. H9 hESCs were obtained from the USC 
stem cell core center under feeder-free conditions. Around 2 million cells were 
cross-linked with 1 % freshly prepared formaldehyde for 1 0 min at room temperature. 
The reaction was quenched by addition of ice-cold glycine to a final concentration of 
0.125 M and incubated on ice for 5 min. The cell pellets were collected and 
re-suspended in 10 mL Triton XI 00 buffer (0.25% Triton XI 00, 10 mM EDTA 
pH = 8.0, 10 mM Tris-HCl pH = 8.0, 100 mMNaCl, IX protease inhibitor cocktail) 
and rotated at 4°C for 30 min. The mixture was then centrifuged at 3000 rpm for 
5 min at room temperature. The collected chromatin pellets were re- suspended in 
600 uLSDSlysisbuffer(l%SDS,5 mM EDTA pH = 8.0,50 mM Tris-HCl pH = 8.0, 
1 X protease inhibitor cocktail) and sonicated to an average size of 500 bp using a 
Branson Digital Sonifier. After a final centrifugation at 14000 rpm for 10 min at 4°C, 
the supernatant was collected and saved at — 80°C. 

4C library preparation. The chromatin fragments were diluted 5 -fold in 10 mM 
Tris-HCl pH = 8, mixed with Triton X100 (1.7% final concentration) and rotated at 
37°C for 2 h. The mixture was blunt-end repaired (NEB) for 30 min at room 
temperature and then ligated under diluted conditions (chromatin corresponding to 
—70,000 cells per mL of ligation mixture) for 24 h at 4°C. Reversal of crosslinking was 
carried out at 65°C for 20 h with proteinase K (0.7 mg/mL final concentration). 
Protein-free DNA was extracted with phenobchloroform (1:1) and precipitated in 



70% EtOH, 0.3 M NaAcetate pH = 5.6 at -80°C. After centrifugation at 14000 rpm 
for 45 min, the DNA pellets were washed with 70% EtOH, air-dried and dissolved in 
10 mM Tris-HCl pH = 8.0. The resuspended DNA was further treated with RNase A 
before purification using a MinElute Reaction Cleanup Kit. 

The 4C libraries were amplified by nested inverse PCR using 2 sets of PCR primers 
and KOD hot- start DNA polymerase. Each of the two rounds of PCR ran 20 cycles for 
amplification. PCR products were purified using a MinElute PCR Purification Kit. In 
the final step, the DNA fragments were sonicated to an average size of 200 bp using a 
Covaris sonicator before Illumina next-generation sequencing. 

Generation and processing of 4C-Seq data. The paired- end sequencing data from 
the Illumina HiSeq platform were processed as follows. 20 bp tags from the paired- 
ends were aligned to the hgl8 assembly separately using BWA aligner 50 . Uniquely 
mapped paired tags were processed to capture 4C junction sites formed between the 
bait putative enhancer region and its interacting partners, classified as distal and 
proximal interactions. Distal interactions included intra-chromosomal junctions 
with genomic distance greater than 20 kb and inter-chromosomal junctions, whereas 
proximal interactions covered intra-chromosomal junctions with a genomic distance 
between 300 bp and 20 kb. One additional filter was used to remove the tags close to 
the reads enclosed by simple repeating elements (http://hgdownload.cse.ucsc.edu/ 
goldenPath/hgl8/database/). Finally, tags within a 100 bp range were merged as a 
unique interacting site. Singlet sites were removed before further analysis, following 
previous studies 18,21 . 

Statistical model to identify enriched interacting domains. A statistical model was 
built to identify interacting regions that exhibit a higher frequency of interactions 
than expected from random collision 21 . For every interacting site i on chromosome W 
(length L w ), the number of interacting sites within a certain window w with length l w 
(we chose ± 1 Mb in the study) from the analyzed site was counted as C i>w , and a 
z- score was calculated as an enrichment score: 

vW(!- p w) 

in which p w = U-w/lw (M-w is the expected number of interacting sites in window w on 
chromosome W). 

We further assigned statistical significance to each interacting site by a FDR-based 
approach. Briefly, we randomly permutated calculated z-score data for every 
chromosome 100 times, and chose interacting sites with a false discovery rate (FDR) 
< 5% as positively interacting sites. FDR for each site was calculated by counting the 
number of randomly permutated Z- scores that are above experimentally determined 
Z-score. All the interacting sites within ± 1 Mb range of a positively interacting site 
were clustered as an enriched interacting domain. Overlapping interacting domains 
were further merged together. 

DNA-FISH. FISH probes labeled with Cy5 and Alexa-488 dyes were prepared 
in-house. Probes mixed with human Cot- 1 DNA and salmon sperm DNA were 
denatured for 7 min at 75C. Competition was carried out for 30 min at 37C. 
Formaldehyde fixed H9 cells grown on glass covers were dehydrated with 80%, 90% 
and 100% ice-cold ethanol sequentially, each for 5 min on ice. The air-dried samples 
were permeabilized in 0.2 N HCl/0.2% Tween 20 for 10 min at room temperature, 
followed by neutralization with CMF-PBS/0.2% Tween. The cells were then 
denatured in 70% formamide/2 X SSC (pH = 7.2) for 10 min. at 80°C and 
dehydrated again with 80%, 90% and 100% ice-cold ethanol sequentially. The 
hybridization reaction was carried out at 37°C overnight and fluorescent images were 
taken on a Zeiss 510 Meta two-photon system. The percentage of co-localization 
between POU5F1 and potentially interacting loci was compared with that between the 
POU5F1 locus and a negative control region on chromosome 12 with roughly 
four-hundred cells counted in each case. Statistical analysis was performed using 
Fisher's exact test with two-tails. 

ChlP-Seq, RNA-Seq and DNA replication timing data. The ChlP-Seq data for 
histone modifications and DNA binding proteins in HI hESCs were retrieved from 7 
and the ENCODE database (http://genome.ucsc.edu/ENCODE, GSE32465). The 
transcriptome data for HI hESCs and AG04450 fetal lung fibroblasts were retrieved 
from the ENCODE database (http://genome.ucsc.edu/ENCODE). DNA replication 
timing array data for H9 hESCs was downloaded from the Replication Domain 
Database (http://www.replicationdomain.com). After data smoothing, segmentation 
of replication timing was performed by DNAcopy (Bioconductor) using the 
parameters defined in the previous study 51 . The segments with mean replication 
timing ratio above zero are defined as early replication areas. 

Statistical analysis. Statistical tests were executed using the R software suite 
(http://www.r-project.org/). 
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